[ES-1804970] Fix CloudFetch returning stale column names from cached results by sreekanth-db · Pull Request #346 · databricks/databricks-sql-go

sreekanth-db · 2026-04-10T13:03:39Z

Summary

Fixes a bug where arrow.Record.Schema() returns stale column aliases when CloudFetch serves cached Arrow IPC files from a structurally identical prior query with different AS aliases.

Root cause: NewCloudBatchIterator was not receiving the authoritative schema bytes from GetResultSetMetadata, unlike the local batch path which already had this. CloudFetch Arrow IPC files have column names baked in from the original query, and the driver was reading them as-is.
Fix: Pass arrowSchemaBytes (the authoritative schema from GetResultSetMetadata) into NewCloudBatchIterator. After records are deserialized from the IPC stream, replace the stale schema with the authoritative one using array.NewRecord() (zero-copy — shares underlying column data, only swaps metadata).

Changes

arrowRecordIterator.go — Pass ri.arrowSchemaBytes to NewCloudBatchIterator in newBatchIterator()
arrowRows.go — Pass schemaBytes to NewCloudBatchIterator in NewArrowRowScanner()
batchloader.go — Core fix:
- NewCloudBatchIterator accepts arrowSchemaBytes, parses into *arrow.Schema, stores on batchIterator
- batchIterator.Next() applies override schema to CloudFetch records only (local path is untouched, overrideSchema is nil)
- Added schemaFromIPCBytes() helper
- Field count validation guard to prevent panics on schema mismatch
- Schema parse failure logged at Warn level
batchloader_test.go — Added TestCloudFetchSchemaOverride with two subtests:
- Verifies stale column names ["id","name"] are overridden to ["x","y"]
- Verifies nil schema bytes pass through original names unchanged

Who is affected

Go driver users with CloudFetch enabled (WithCloudFetch(true)) who read arrow.Record.Schema() directly. Python, ODBC, and JDBC drivers are not affected.

Test plan

All existing unit tests pass (37 tests in internal/rows/arrowbased/)
New unit test TestCloudFetchSchemaOverride covers the override and no-override paths
Verified end-to-end against a real Databricks warehouse using samples.tpch.lineitem (~30M rows) with two queries differing only in column aliases — confirmed arrow.Record.Schema() now returns correct aliases

This pull request was AI-assisted by Isaac.

…results When the server result cache serves Arrow IPC files from a prior query, the embedded schema contains stale column aliases. The Go driver's CloudFetch path read these stale names directly, while the local path already used the authoritative schema from GetResultSetMetadata. Pass the authoritative schema bytes into NewCloudBatchIterator and replace stale column names on deserialized records using array.NewRecord, which is zero-copy (shares underlying column data). Co-authored-by: Isaac Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>

sreekanth-db force-pushed the fix/ES-1804970-cloudfetch-stale-column-names branch from ce777da to 65a8750 Compare April 10, 2026 13:05

Fix lint: check error return values in test handler

de6c69e

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>

sreekanth-db requested a review from gopalldb April 10, 2026 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ES-1804970] Fix CloudFetch returning stale column names from cached results#346

[ES-1804970] Fix CloudFetch returning stale column names from cached results#346
sreekanth-db wants to merge 2 commits intodatabricks:mainfrom
sreekanth-db:fix/ES-1804970-cloudfetch-stale-column-names

sreekanth-db commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sreekanth-db commented Apr 10, 2026

Summary

Changes

Who is affected

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant